back-propagation network
Comparing Biases for Minimal Network Construction with Back-Propagation
This approach can be used to (a) dynamically select the number of hidden units. The method Rumelhart suggests involves adding penalty terms to the usual error function. In this paper we introduce Rumelhart·s minimal networks idea and compare two possible biases on the weight search space. These biases are compared in both simple counting problems and a speech recognition problem.
Handwritten Digit Recognition with a Back-Propagation Network
We present an application of back-propagation networks to hand(cid:173) written digit recognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1 % error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service.
A practical Bayesian framework for back-propagation networks
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Generalization Dynamics in LMS Trained Linear Networks
Recent progress in network design demonstrates that nonlinear feedforward neural networks can perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures also suggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.
Relaxation Networks for Large Supervised Learning Problems
Alspector, Joshua, Allen, Robert B., Jayakumar, Anthony, Zeppenfeld, Torsten, Meir, Ronny
Feedback connections are required so that the teacher signal on the output neurons can modify weights during supervised learning. Relaxation methods are needed for learning static patterns with full-time feedback connections. Feedback network learning techniques have not achieved wide popularity because of the still greater computational efficiency of back-propagation. We show by simulation that relaxation networks of the kind we are implementing in VLSI are capable of learning large problems just like back-propagation networks. A microchip incorporates deterministic mean-field theory learning as well as stochastic Boltzmann learning. A multiple-chip electronic system implementing these networks will make high-speed parallel learning in them feasible in the future.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Generalization Dynamics in LMS Trained Linear Networks
Recent progress in network design demonstrates that nonlinear feedforward neural networks can perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures also suggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.
- North America > United States > California > San Mateo County > San Mateo (0.05)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany > Berlin (0.04)
Relaxation Networks for Large Supervised Learning Problems
Alspector, Joshua, Allen, Robert B., Jayakumar, Anthony, Zeppenfeld, Torsten, Meir, Ronny
Feedback connections are required so that the teacher signal on the output neurons can modify weights during supervised learning. Relaxation methods are needed for learning static patterns with full-time feedback connections. Feedback network learning techniques have not achieved wide popularity because of the still greater computational efficiency of back-propagation. We show by simulation that relaxation networks of the kind we are implementing in VLSI are capable of learning large problems just like back-propagation networks. A microchip incorporates deterministic mean-field theory learning as well as stochastic Boltzmann learning. A multiple-chip electronic system implementing these networks will make high-speed parallel learning in them feasible in the future.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Relaxation Networks for Large Supervised Learning Problems
Alspector, Joshua, Allen, Robert B., Jayakumar, Anthony, Zeppenfeld, Torsten, Meir, Ronny
Feedback connections are required so that the teacher signal on the output neurons can modify weights during supervised learning. Relaxation methods are needed for learning static patterns with full-time feedback connections. Feedback network learning techniques have not achieved wide popularity because of the still greater computational efficiency of back-propagation. We show by simulation that relaxation networks of the kind we are implementing in VLSI are capable of learning large problems just like back-propagation networks. A microchip incorporates deterministic mean-field theory learning as well as stochastic Boltzmann learning. A multiple-chip electronic system implementing these networks will make high-speed parallel learning in them feasible in the future.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Generalization Dynamics in LMS Trained Linear Networks
Recent progress in network design demonstrates that nonlinear feedforward neural networkscan perform impressive pattern classification for a variety of real-world applications (e.g., Le Cun et al., 1990; Waibel et al., 1989). Various simulations and relationships between the neural network and machine learning theoretical literatures alsosuggest that too large a number of free parameters ("weight overfitting") could substantially reduce generalization performance.
- North America > United States > California > San Mateo County > San Mateo (0.05)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany > Berlin (0.04)
Dynamic Behavior of Constained Back-Propagation Networks
It is generally admitted that generalization performance of back-propagation networks (Rumelhart, Hinton & Williams, 1986) will depend on the relative size ofthe training data and of the trained network. By analogy to curve-fitting and for theoretical considerations, the generalization performance of the network should decrease as the size of the network and the associated number of degrees of freedom increase (Rumelhart, 1987; Denker et al., 1987; Hanson & Pratt, 1989). This paper examines the dynamics of the standard back-propagation algorithm (BP) and of a constrained back-propagation variation (CBP), designed to adapt the size of the network to the training data base. The performance, learning dynamics and the representations resulting from the two algorithms are compared.